Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Building Specialized Multilingual Lexical Graphs Using Community Resources

Identifieur interne : 000146 ( France/Analysis ); précédent : 000145; suivant : 000147

Building Specialized Multilingual Lexical Graphs Using Community Resources

Auteurs : Mohammad Daoud [France] ; Christian Boitet [France] ; Kyo Kageura [Japon] ; Asanobu Kitamoto [Japon] ; Mathieu Mangeot [France] ; Daoud Daoud [Jordanie]

Source :

RBID : ISTEX:7E30D684F9AB8523E45902BC589A219A2A9A9142

Abstract

Abstract: We are describing methods for compiling domain-dedicated multilingual terminological data from various resources. We focus on collecting data from online community users as a main source, therefore, our approach depends on acquiring contributions from volunteers (explicit approach), and it depends on analyzing users’ behaviors to extract interesting patterns and facts (implicit approach). As a generic repository that can handle the collected multilingual terminological data, we are describing the concept of dedicated Multilingual Preterminological Graphs MPGs, and some automatic approaches for constructing them by analyzing the behavior of online community users. A Multilingual Preterminological Graph is a special lexical resource that contains massive amount of terms related to a special domain. We call it preterminological, because it is a raw material that can be used to build a standardized terminological repository. Building such a graph is difficult using traditional approaches, as it needs huge efforts by domain specialists and terminologists. In our approach, we build such a graph by analyzing the access log files of the website of the community, and by finding the important terms that have been used to search in that website, and their association with each other. We aim at making this graph as a seed repository so multilingual volunteers can contribute. We are experimenting this approach with the Digital Silk Road Project. We have used its access log files since its beginning in 2003, and obtained an initial graph of around 116000 terms. As an application, we used this graph to obtain a preterminological multilingual database that is serving a CLIR system for the DSR project.

Url:
DOI: 10.1007/978-3-642-14415-8_7


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:7E30D684F9AB8523E45902BC589A219A2A9A9142

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Building Specialized Multilingual Lexical Graphs Using Community Resources</title>
<author>
<name sortKey="Daoud, Mohammad" sort="Daoud, Mohammad" uniqKey="Daoud M" first="Mohammad" last="Daoud">Mohammad Daoud</name>
</author>
<author>
<name sortKey="Boitet, Christian" sort="Boitet, Christian" uniqKey="Boitet C" first="Christian" last="Boitet">Christian Boitet</name>
</author>
<author>
<name sortKey="Kageura, Kyo" sort="Kageura, Kyo" uniqKey="Kageura K" first="Kyo" last="Kageura">Kyo Kageura</name>
</author>
<author>
<name sortKey="Kitamoto, Asanobu" sort="Kitamoto, Asanobu" uniqKey="Kitamoto A" first="Asanobu" last="Kitamoto">Asanobu Kitamoto</name>
</author>
<author>
<name sortKey="Mangeot, Mathieu" sort="Mangeot, Mathieu" uniqKey="Mangeot M" first="Mathieu" last="Mangeot">Mathieu Mangeot</name>
</author>
<author>
<name sortKey="Daoud, Daoud" sort="Daoud, Daoud" uniqKey="Daoud D" first="Daoud" last="Daoud">Daoud Daoud</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:7E30D684F9AB8523E45902BC589A219A2A9A9142</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1007/978-3-642-14415-8_7</idno>
<idno type="url">https://api.istex.fr/document/7E30D684F9AB8523E45902BC589A219A2A9A9142/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000193</idno>
<idno type="wicri:Area/Istex/Curation">000190</idno>
<idno type="wicri:Area/Istex/Checkpoint">000320</idno>
<idno type="wicri:doubleKey">0302-9743:2010:Daoud M:building:specialized:multilingual</idno>
<idno type="wicri:Area/Main/Merge">000745</idno>
<idno type="wicri:Area/Main/Curation">000740</idno>
<idno type="wicri:Area/Main/Exploration">000740</idno>
<idno type="wicri:Area/France/Extraction">000146</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Building Specialized Multilingual Lexical Graphs Using Community Resources</title>
<author>
<name sortKey="Daoud, Mohammad" sort="Daoud, Mohammad" uniqKey="Daoud M" first="Mohammad" last="Daoud">Mohammad Daoud</name>
<affiliation wicri:level="4">
<country xml:lang="fr">France</country>
<wicri:regionArea>Grenoble Informatics Laboratory, GETALP, Université Joseph Fourier, 385, rue de la Bibliothèque, 38041, Grenoble</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
<settlement type="city">Grenoble</settlement>
</placeName>
<orgName type="university">Université Joseph Fourier</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Boitet, Christian" sort="Boitet, Christian" uniqKey="Boitet C" first="Christian" last="Boitet">Christian Boitet</name>
<affiliation wicri:level="4">
<country xml:lang="fr">France</country>
<wicri:regionArea>Grenoble Informatics Laboratory, GETALP, Université Joseph Fourier, 385, rue de la Bibliothèque, 38041, Grenoble</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
<settlement type="city">Grenoble</settlement>
</placeName>
<orgName type="university">Université Joseph Fourier</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Kageura, Kyo" sort="Kageura, Kyo" uniqKey="Kageura K" first="Kyo" last="Kageura">Kyo Kageura</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Library and Information Science Laboratory, Graduate School of Education, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-0033, Tokyo</wicri:regionArea>
<placeName>
<settlement type="city">Tokyo</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author>
<name sortKey="Kitamoto, Asanobu" sort="Kitamoto, Asanobu" uniqKey="Kitamoto A" first="Asanobu" last="Kitamoto">Asanobu Kitamoto</name>
<affiliation wicri:level="3">
<country>Japon</country>
<placeName>
<settlement type="city">Tokyo</settlement>
</placeName>
<wicri:orgArea>The National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430</wicri:orgArea>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author>
<name sortKey="Mangeot, Mathieu" sort="Mangeot, Mathieu" uniqKey="Mangeot M" first="Mathieu" last="Mangeot">Mathieu Mangeot</name>
<affiliation wicri:level="4">
<country xml:lang="fr">France</country>
<wicri:regionArea>Grenoble Informatics Laboratory, GETALP, Université Joseph Fourier, 385, rue de la Bibliothèque, 38041, Grenoble</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
<settlement type="city">Grenoble</settlement>
</placeName>
<orgName type="university">Université Joseph Fourier</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Daoud, Daoud" sort="Daoud, Daoud" uniqKey="Daoud D" first="Daoud" last="Daoud">Daoud Daoud</name>
<affiliation>
<wicri:noCountry code="subField">Al-Jubaiha</wicri:noCountry>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Jordanie</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2010</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">7E30D684F9AB8523E45902BC589A219A2A9A9142</idno>
<idno type="DOI">10.1007/978-3-642-14415-8_7</idno>
<idno type="ChapterID">7</idno>
<idno type="ChapterID">Chap7</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: We are describing methods for compiling domain-dedicated multilingual terminological data from various resources. We focus on collecting data from online community users as a main source, therefore, our approach depends on acquiring contributions from volunteers (explicit approach), and it depends on analyzing users’ behaviors to extract interesting patterns and facts (implicit approach). As a generic repository that can handle the collected multilingual terminological data, we are describing the concept of dedicated Multilingual Preterminological Graphs MPGs, and some automatic approaches for constructing them by analyzing the behavior of online community users. A Multilingual Preterminological Graph is a special lexical resource that contains massive amount of terms related to a special domain. We call it preterminological, because it is a raw material that can be used to build a standardized terminological repository. Building such a graph is difficult using traditional approaches, as it needs huge efforts by domain specialists and terminologists. In our approach, we build such a graph by analyzing the access log files of the website of the community, and by finding the important terms that have been used to search in that website, and their association with each other. We aim at making this graph as a seed repository so multilingual volunteers can contribute. We are experimenting this approach with the Digital Silk Road Project. We have used its access log files since its beginning in 2003, and obtained an initial graph of around 116000 terms. As an application, we used this graph to obtain a preterminological multilingual database that is serving a CLIR system for the DSR project.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
<li>Japon</li>
<li>Jordanie</li>
</country>
<region>
<li>Auvergne-Rhône-Alpes</li>
<li>Rhône-Alpes</li>
</region>
<settlement>
<li>Grenoble</li>
<li>Tokyo</li>
</settlement>
<orgName>
<li>Université Joseph Fourier</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Auvergne-Rhône-Alpes">
<name sortKey="Daoud, Mohammad" sort="Daoud, Mohammad" uniqKey="Daoud M" first="Mohammad" last="Daoud">Mohammad Daoud</name>
</region>
<name sortKey="Boitet, Christian" sort="Boitet, Christian" uniqKey="Boitet C" first="Christian" last="Boitet">Christian Boitet</name>
<name sortKey="Boitet, Christian" sort="Boitet, Christian" uniqKey="Boitet C" first="Christian" last="Boitet">Christian Boitet</name>
<name sortKey="Daoud, Mohammad" sort="Daoud, Mohammad" uniqKey="Daoud M" first="Mohammad" last="Daoud">Mohammad Daoud</name>
<name sortKey="Mangeot, Mathieu" sort="Mangeot, Mathieu" uniqKey="Mangeot M" first="Mathieu" last="Mangeot">Mathieu Mangeot</name>
<name sortKey="Mangeot, Mathieu" sort="Mangeot, Mathieu" uniqKey="Mangeot M" first="Mathieu" last="Mangeot">Mathieu Mangeot</name>
</country>
<country name="Japon">
<noRegion>
<name sortKey="Kageura, Kyo" sort="Kageura, Kyo" uniqKey="Kageura K" first="Kyo" last="Kageura">Kyo Kageura</name>
</noRegion>
<name sortKey="Kageura, Kyo" sort="Kageura, Kyo" uniqKey="Kageura K" first="Kyo" last="Kageura">Kyo Kageura</name>
<name sortKey="Kitamoto, Asanobu" sort="Kitamoto, Asanobu" uniqKey="Kitamoto A" first="Asanobu" last="Kitamoto">Asanobu Kitamoto</name>
<name sortKey="Kitamoto, Asanobu" sort="Kitamoto, Asanobu" uniqKey="Kitamoto A" first="Asanobu" last="Kitamoto">Asanobu Kitamoto</name>
</country>
<country name="Jordanie">
<noRegion>
<name sortKey="Daoud, Daoud" sort="Daoud, Daoud" uniqKey="Daoud D" first="Daoud" last="Daoud">Daoud Daoud</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000146 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000146 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    France
   |étape=   Analysis
   |type=    RBID
   |clé=     ISTEX:7E30D684F9AB8523E45902BC589A219A2A9A9142
   |texte=   Building Specialized Multilingual Lexical Graphs Using Community Resources
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024